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Abstract 

In  this  paper  we  propose  a  new  algorithm  for 
finding  the  blocks  (biconnected  components)  of  an  undi- 
rected graph.   A  serial  implementation  runs  in  0(n+m) 
time  and  space  on  a  graph  of  n  vertices  and  m  edges.   A 
parallel  implementation  runs  in  O(log  n)  time  and  0(n+m) 
space  using  0(n+m)  processors  on  a  concurrent-read,  con- 

current-v/rite  parallel  RAil.   An  alternative  implementation 

2  2 

runs    in  0(n   /p)    tim.e   and   0(n    )    space   using   any   nurriber 

p  £  n  /log  n  of  processors,  on  a  concurrent-read,  exclusive- 
write  parallel  RAM,   The  latter  algorithm  has  optimal 
speed-up,  assum.ing  an  adjacency  matrix  representation  of 
the  input. 


Keywords :   Parallel  graph  algorithm,  biconnected  components,  ■ 
blocks,  spanning  tree. 
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1 .   Introduction 

In  this  paper  we  consider  the  problem  of  computing 
the  blocks  (biconnected  components)  of  a  given  undirected 
graph  G  =  (V,E) .   As  a  model  of  parallel  computation,  we  use 
a  concurrent-read,  concurrent-write  parallel  RAM  (CRCW  PRAM). 
All  the  processors  have  access  to  a  common  memory  and  run 
synchronously.   Simultaneous  reading  by  several  processors 
from  the  same  memory  location  is  allowed  as  well  as  simul- 
taneous writing.   In  the  latter  case  one  processor  succeeds 
but  we  do  not  know  in  advance  which.   This  model,  used  for 
instance  in  [SV  82],  is  a  member  of  a  family  of  models  for 
parallel  computation.   (See  [BK  82],  [SV  81],  [V  81a].) 

We  propose  a  new  algorithm  for  finding  blocks. 
We  discuss  three  implementations  of  the  algorithm: 

1.  A  linear-time  sequential  implementation. 

2.  A  parallel  implementation  using  O(log  n)  time,  0(n+m) 

space,  and  0(n+m)  processors,  where  n  =  |v|'  and  m  =  !e| . 
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3.  An  alternative  parallel  implementation  using  0(n  /p)  time, 

2  2     2 

0(n  )  space,  and  any  number  p  <_  n  /log  n  of  processors. 


This  implementation  uses  a  concurrent-read,  exclusive- 
write  parallel  RAxM  (CREW  PRAM)  .   This  model  differs 
from  the  CRCW  PRAM  in  not  allowing  simultaneous  writing 
by  more  than  one  processor  into  the  same  memory  location. 
The  speed-up  of  this  implementation  is  optimal  in  the 
sense  that  the  time-processor  product  is  0{n  ),  which 
is  the  time  required  by  an  optimal  sequential  algorithm 
if  the  input  representation  is  an  adjacency  matrix. 

Implementation  2  is  faster  than  any  of  the 

previously  known  parallel  algorithms  [SJ  81] ,  [Ec  79b] , 
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[TC  82].   Eckstein's  algorithm  [Ec  79b]  uses  0(d  log  n)  time 

and  0((n+m)/d)  processors,  where  d  is  the  diameter  of  the 
graph.   The  first  (resp.  second)  algorithm  of  Savage  and 
Ja'Ja'  [SJ  81]  uses  O(log  n)  (resp.  O((log  n)log  k) )  time, 
where  k  is  the  number  of  blocks,  and  0(n  /log  n)  (resp. 
0(mn+n  log  n) )  processors.   Tsin  and  Chin's  algorithm 
[TC  82]  matches  the  bounds  of  our  implementation  3  but 
is  more  complicated.   These  algorithms  use  the  CREW  PRAM 
model,  which  is  somewhat  weaker  than  the  CRCVJ  PRAM  model. 
However,  Eckstein  [Ec  79a]  and  Vishkin  [V  81a]  present 
general  simultation  methods  that  enable  us  to  run 
implementation  2  on  a  CREW  PRAM  in  O(log  n)  time,  without 
increasing  the  number  of  processors.   On  sparse  graphs,  the 
resulting  algorithm  uses  fewer  processors  than  either  our 
implementation  3  or  the  algorithm  of  Tsin  and  Chin. 


Each  of  our  implementations  readily  implies  an 
algorithm  for  computing  bridges  in  the  same  time  and  number 
of  processors.   This  improves  on  the  bridge-finding  algorithm 

of  Savage  and  Ja'Ja'  [SJ  81],  which  runs  in  O(log  n)  time 
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using  0(n  log  n)  processors.   Tsin  and  Chin's  algorithm  for 

bridges  matches  the  bounds  of  our  implementation  3. 

We  achieve  our  improvements  through  two  new 
ideas : 

1.  A  block-finding  algorithm  that  uses  any  spanning  tree.   The 
previously  known  linear-time  algorithm  for  finding  blocks 
uses  a  depth-first  spanning  tree  [Ta  72] .   Depth-first 
search  seems  to  be  inherently  serial;  i.e.  there  is  no 
apparent  way  to  implement  it  in  poly-log  parallel  time. 

A  similar  but  more  complicated  block-finding  algorithm 
was  discovered  independently  by  Tsin  [Ts  82]. 

2.  New  implementation  techniques  for  parallel  algorithms  on 
trees.   These  techniques  allow  the  computation  of  various 
kinds  of  information  about  the  tree  structure  in 

Odog  n)  time,  and  are  likely  to  have  other  applications. 

The  remainder  of  the  paper  consists  of  three 
sections.   In  Section  2  we  develop  the  block-finding  algorithm 
and  give  a  linear-time  sequential  implementation.   In  Section  3 
we  describe  our  O(log  n)-tiiTie  parallel  implementation.   Section 
4  sketches  our  alternative  parallel  implementation. 

Note ■   Whenever  specifying  the  number  of  processors  used  by  a 
parallel  algorithm  we  ignore  the  constant  factor.   We  can  always 
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save  a  constant  factor  in  the  number  of  processors  at  the 
cost  of  the  same  constant  factor  in  running  time.   [_| 

Historical  Remark.   A  variant  of  the  block-finding  algorithm 
presented  here  was  first  discovered  by  R.  Tarjan  in  1974 
[T  82].   U.  Vishkin  independently  rediscovered  a  similar 
algorithm  in  1983  and  proposed  a  parallel  implementation 
[V  83] .   Subsequent  simplification  by  the  two  authors 
working  together  resulted  in  the  present  paper.   [^ 

2  .   Finding  Blocks 

Let  G  =  (V,E)  be  a  connected  undirected  graph. 
Let  R  be  the  relation  on  the  edges  of  G  defined  by  e^Re^ 
if  and  only  if  e,  =  e-  or  e,  and  e-  are  on  a  common  simple 
cycle   of  G.   It  is  known  that  R  is  an  equivalence  relation 

[Ha  69].   The  subgraphs  of  G  induced  by  the  equivalence 
classes  of  R  are  the  blocks  (sometimes  called  biconnected 
components)  of  G.   The  vertices  in  two  or  more  blocks  are 
the  cut  vertices  (sometimes  called  articulation  points  )  of 
G;  these  are  the  vertices  whose  removal  disconnects  G. 
The  edges  in  singleton  equivalence  classes  are  the  bridges 
of  G;  these  are  the  edges  whose  removal  disconnects  G. 

(See  Figure  1.) 

[Figure  1] 

We  can  compute  the  equivalence  classes  of  R,  and 
thus  the  blocks  of  G,  in  0(n+m)  serial  time  using  depth-first 


In  this  paper  a  cycle  is  a  path  starting  and  ending  at  the 
same  vertex  and  repeating  no  edge;  a  cycle  is  simple  if  it 
repeats  no  vertex  except  the  first,  which  occurs  exactly  twice, 


search  [Ta  72],  where  n  =  |v]  and  m  =  |e|.   Unfortunately, 
this  algorithm  seems  to  have  no  fast  parallel  implementation. 
In  this  section  we  develop  an  0(n+m)-time  serial  algorithm 
that  is  suited  for  parallel  implementation.   The  algorithm 
can  use  any  spanning  tree,  rather  than  just  a  depth-first 
spanning  tree.   A  similar  but  more  complicated  algorithm  was 
developed  independently  by  Tsin  [Ts  82] . 

We  shall  define  an  auxiliary  graph  G'  of  G  whose 
connected  components  correspond  to  the  blocks  of  G.   The 
vertices  of  G'  are  the  edges'  of  G;  if  S  is  a  set  of  edges 
in  G,  S  induces  a  block  of  G  if  and  only  if  S  induces  a  con- 
nected component  of  G'.   Let  T  be  any  rooted  spanning  tree  of 
G.   We  shall  denote  the  edges  of  T  by  v  -►  w,  where  v  is  the 
parent  of  w,  denoted  by  p(w).   Let  the  vertices  of  T  be  numbered 
from  1  to  n  in  preorder  and  identify  each  vertex  by  its  number. 
G'  contains  each  edge  of  G  as  a  vertex  and  all  edges  of  the 
following  forms  (see  Figure  2) : 

(i)   {  (u,w}  ,  { v,w} }  ,  where  u -*  w  is  an  edge  of  T  and  {v,w} 
is  an  edge  of  G-T  such  that  v<w. 
(ii)   {  {u, v}  ,  {x,w}  }  ,  where  u -^  v  and  x -*  w  are  edges  of  T 
and  {v,w}  is  an  edge  of  G-T  such  that  v  and  w  are 
unrelated  in  T. 
(iii)   { {u,  v)  ,  { v,w}  }  ,  where  u -^  v  and  v -►  w  are  edges  of  T 
and  some  edge  of  G  joins  a  descendant  of  w  with  a 
non-descendant  of  v. 


Theorem  1.   Two  edges  of  G  are  in  a  common  block  of  G  if  and 
only  if  as  vertices  of  G'  they  are  in  a  common  connected 
component  of  G'. 

Proof.   Any  edge  {x,y}  of  G-T  defines  a  simple  cycle  of  G, 
consisting  of  edge  {x,y}  and  the  unique  path  in  T  joining 
X  and  y.   These  cycles  are  a  cycle  basis  of  G;  the  edge  set 
of  any  cycle  is  the  mod-two  sum  of  the  edge  sets  of  approp- 
riate basis  cycles  [Be  73].   Define  the  relation  R'  by  e,R'e2 
if  and  only  if  e,  and  e_  are  two  edges  of  G  on  a  common  basis 
cycle,  and  let  R'*  be  the  reflexive,  transitive  closure  of  R". 

We  claim  R'*  =  R.   Since  R  is  an  equivalence  relatio: 
and  R'CRf  we  have  R'*CR.   To  prove  the  converse,  suppose 
e-.Re^.   Then  e^  and  e^  are  on  a  common  simple  cycle,  which 
is  a  mod-two  sum  of  basis  cycles  C, ,C2 / • ■ ■ ,C,  .   Without 
loss  of  generality  we  can  order  C,,C2/---,C,  so  that  C^  for 
i  >  1  has  at  least  one  edge  in  common  with  some  C .  such  that 
j  <  i.   (Otherwise  the  mod-two  sum  of  CwC2,.-./C,  would 
induce  a  disconnected  subgraph.)   It  follows  by  induction  on 
k  that  all  edges  in  C,,C2/-.-/C,  are  equivalent  under  R'*, 
and  in  particular  e-R'*e2-   Thus  RCR'*. 

Let  {u,v}  and  {x,w}  be  adjacent  in  L'.   If 
Case  (i)  holds,  {u,v}  is  on  the  basis  cycle  defined  by 
{x,w} ,   (In  this  case  x  =  v.)   If  Case  (ii)  holds,  {u,v} 
and  {x,w}  are  on  the  basis  cycle  defined  by  {v,w}.   If 
Case  (iii)  holds,  say  {y,z}  is  an  edge  with  y  a  descendant 
of  w  and  z  a  non-descendant  of  v  =  x,  then  {u,v}  and  {x,w} 
are  on  the  basis  cycle  defined  by  {y,z}.   Thus  in  all  cases 
{u,v}  and  •b<,w}  are  in  the  same  block  of  G. 


conversely,  let  {x,y}  be  an  edge  of  G-T  defining 
a  basis  cycle  consisting  of  edge  {x,y},  edges  on  the  tree 
path  from  z  to  x,  and  edges  on  the  tree  path  from  z  to  y , 
where  z  is  the  nearest  common  ancestor  of  x  and  y.   Without 
loss  of  generality  suppose  x  <  y.   By  Case  (i),  {x,y} 
and  {p(y),y}  are  adjacent  in  L'.   The  existence  of  {x,y} 
implies  by  Case  (iii)  that  any  two  edges  on  the  tree  path 
from  z  to  X  are  adjacent  in  L'.   Similarly  any  two  edges 
on  the  tree  path  from  z  to  y  are  adjacent.   If  z  =  x,  the 
tree  path  from  z  to  x  is  empty.   Otherwise  (i.e.  z    4   x)  ,    x 
and  y  are  unrelated,  and  by  Case  (ii)  {p(x),x}  and 
{p(y),y}  are  adjacent  in  L".   Thus  all  edges  on  the  basis 
cycle  are  in  the  same  connected  component  of  L'.   The 
theorem  follows.   Q 

Theorem  1  gives  the  following  0(n+m)-time  serial 
algorithm  for  finding  blocks: 

Step  1.   Find  a  spenning  tree  T  of  G  using  any  linear-time 
search  method .   Number  the  vertices  of  G  from  1  to  n  in 
preorder  and  identify  each  vertex  by  its  preorder  number. 
Compute  the  number  of  descendants  nd(v)  of  each  vertex  v 
by  processing  the  vertices  in  postorder  using  the  recurrence 
nd(v)  =  1  +i;fcd(w)|v-^w  in  T }  .   (VJe  regard  every  vertex  as 
a  descendant  of  itself.)   A  vertex  v  is  a  descendant  of 
another  vertex  w  if  and  only  if  v  <_  w  <_  v  +  nd(v)-l  [Ta  74]  . 

Step  2.   For  each  vertex  v,  compute  low(v) ,  the  lowest 
vertex  that  is  either  a  descendant  of  v  or  adjacent  to  a 
descendant  of  v  by  an  edge  of  G-T,  and  hiah(v),  the  highest 


8  - 


vertex  that  is  either  a  descendant  of  v  or  adjacent  to  a 
descendant  of  v  by  an  edge  of  G-T.   The  complete  set  of 
2n  low  and  high  vertices  can  be  computed  in  0(n+m)  time  by 
processing  the  vertices  of  T  in  postorcer  using  the 
following  recurrences: 

low  (v)  =  mi  n  ( { V }  U  { low  (w )  |  v  ->  w  in  T}u{w|{v,w}  in  G-T}); 

high  (v)  =  iuax(  { v}  u  {high  (v)  jv->-w  in  T}u{w|{v,w}  in  G-T}), 

Step  3.   Construct  L",  the  subgraph  of  L'  induced  by  the 

edges  of  T  as  follows.   (The  edges  of  L"  are  those  impliea  by  Case; 

(li)  and  (iii.)   For  each 'edge  {w,v}  in  G-T  such  that 
V  +  nd(v)  ^  w,  add  {{p(v),v},  {p(w),w}}  to  L" 

(Case  ii))  .   For  each  edge  v  -^  w  of  T  such  that  v  ?^  1 
add  { {p  (v)  ,  v}  ,  { v,w}  }  to  L"  if  low(w)  <  v  or  high  (w)  _^  v  + 
nd(v)  (Case  (iii) )  . 

Step  4  .   Find  the  connected  com.ponents  of  L"  using  any  kind 
of  linear-time  search. 

Step  5.   Extend  the  equivalence  relation  on  the  edges  of  T 
(the  vertices  of  L")  to  the  edges  of  G-T  by  defining  {v,w,} 
equivalent  to  {p(w),w}  for  each  edge  {v,w}  of  G-T  such 
that  V  <  w  (Case  (i) )  . 

It  is  easy  to  implement  this  algorithm  to  run  in 
0(n+m)  time  using  standard  techniques.   (See  [Ta  72].).   If 


only  a  serial  implenentation  is  desired,  the  algorithm 
can  be  simplified  somewhat.   (See  [Ta  82].)   The  algorithm 
as  presented  is  designed  for  easy  parallel  implementation. 
Note  that  each  edge  of  G-T  is  a  vertex  of  degree  one  in 
L',  and  L"  contains  n-1  vertices  and  at  most  m.-l  edges. 

Remark .   Although  we  have  assumed  that  G  is  connected,  we 
can  use  the  algorithm  to  find  the  blocks  of  a  disconnected 
graph  by  applying  it  to  each  of  the  connected  components 
(in  series  in  the  case  of  the  implementation  in  this 
section,  in  parallel  in- the  case  of  the  implementations  in 
Sections  3  or  4),   This  does  not  change  the  resource 
bounds  of  the  algorithm.   Q 

3 .   A  Fast  Parallel  Implementation 

In  this  section  we  describe  how  to  implement  the 
block-finding  algor  thm  of  Section  2  to  run  in  O(log  n) 
time  with  0(n+m)  processors  on  a  CRCW  PRAM.   We  shall 
emphasize  the  ideas  involved,  only  sketching  the  details. 
As  the  input  representation,  we  assume  that  the  vertex  set 
is  V  =  {l,2,...,n}  and  that  each  undirected  edge  {i,j}  is 
represented  by  two  directed  edges  (i,j)  and  (j,i).   Each 
vertex  i  has  a  list  of  its  outgoing  edges:   adj  (i)  points 
to  the  first  such  edge  and  next ( (i , j ) )  points  to  the  edge 
after  (i,j)  on  i's  list.   (If  there  is  no  such  edge, 
next  ((  i,  j)  )  =  null.)   Each  edge  (i,j)  also  has  a  pointer  to 
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its  reversal  (j,i).   Each  vertex  i  and  each  directed  edge 
(i,j)  has  its  own  processor,  denoted  by  £r(i)  and  pr (i , j ) , 
respectively . 

Reirark .   This  input  representation  is  the  most  convenient 
one  for  our  purposes,  but  it  is  not  the  only  one  that  will 
work.   For  example,  we  can  begin  with  an  array  of  the  2m 
directed  edges  in  arbitrary  order  and  use  the  O(log  m)  time, 
0(ro)  processor  sorting  algorithm  of  Ajtai,  Komlos  ,  and 
Szem.eredi  [AKS  83]  to  sort  the  edges  by  first  component. 
Once  the  edges  are  sorted,  it  is  easy  to  construct  incidence 
lists.   Sorting  the  edges  (i,j)  lexicographically  on 
(min{i,j;,  max{i,jl)  allows  the  construction  of  pointers 
between  each  edge  and  its  reversal.   Thus  we  obtain  the 
desired  input  representation.  .  [~~] 

Step  1.   Construction  of  a  spanning  tree  and  computation  of 
the  preorder  number  and  number  of  descendants  of  each  vertex. 

First  we  construct  an  unrooted  spanning  tree  by 
using  a  modification  of  the  Shiloach-Vishkin  connected 
components  algorithm  [SV  82].   We  assume  some 
familiarity  with  this  algorithm.   The  algorithm  maintains 
for  each  vertex  v  a  pointer  D(v).   Initially  D(v)  =  v  for 
all  vertices  v.   As  the  algorithm  proceeds,  the  D-pointers 
are  the  parent  pointers  of  a  forest,  each  tree  of  which 
contains  vertices  known  to  be  in  a  single  connected  component 
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of  the  graph.   (If  v  is  the  root  of  a  tree  in  this  D- forest, 
D(v)  =  V.)   The  D-pointers  are  changed  by  two  kinds  of 
steps: 

Shortcuttinq.   Replace  D(i)  by  D(D(i))  for  some  vertex  i. 
Such  a  step  changes  the  structure  of  the  D-forest  by  moving 
V  and  its  descendants  closer  to  the  root  of  its  tree,  but 
does  not  change  the  vertex  partition  defined  by  the  D- trees. 

Hooking.   Replace  D(D(i))  by  D(j),  where  D(i)  is  the  root 
of  a  D-tree,  j  is  a  vertex  in  another  D-tree,  and  {i,j}  is 
an  edge  in  the  graph. 

We  modify  the  Shiloach-Vishkin  algorithm  so  that 
all  the  edges  are  initially  marked  as  non-tree  edges,  and 
each  time  a  hooking  step  is  performed,  the  corresponding 
graph  edge  {i,j}  is  marked  as  a  tree  edge.   When  the 
algorithm  finishes,  all  the  vertices  are  in  a  single  D-tree, 
and  the  marked  edges  define  a  spanning  tree.   The  original 
algorithm  runs  in  O(log  n)  time  using  0(n+m)  processors; 
these  bounds  are  not  affected  by  the  modifications  for 

computing  a  spanning  tree. 

One  detail  of  this  method  deserves  further  discus- 
sion.  Processors  corresponding  to  several  directed  edges 
(i,j)  may  simultaneously  try  to  write  to  the  same  location 
D(D(i))  to  cause  a  hooking,  but  only  one  succeeds.   In  order 
to  keep  track  of  which  one  succeeds,  we  use  an  auxiliary 


-  12  - 

array  a.   When  a  processor  £r({i,j))  tries  to  cause  a 
hooking  step  to  take  place,  it  first  writes  its  name 
into  a(D(i))  by  the  assignment  a  (D  (i  )  )  -^  £r{(  i  ,  j  )  )  .   For  a 
fixed  value  of  D(i),  only  one  such  processor  succeeds. 
The  successful  processor  £r({i,jj)  then  carries  out  the 
actual  hooking  step  and  marks  both  (i,j)  and  (j,i). 

Remark.   This  idea  for  obtaining  a  spanning  tree  from  a 
connected  components  computation  has  been  used  before . 
In  particular  Savage  and  Ja'Ja'  [SJ  81]  used  it  to  derive  a 
minim.um  spanning  forest  algorithm  from  the  connectivity 
algorithm  of  Hirshberg,  Chandra  and  Sarwate  [HCS  79].   Q 

Having  constructed  an  unrooted  spanning  tree, 
we  must  determine  a  root  and  number  the  vertices  in 
preorder .   First,  we  construct  for  each  vertex  i  a  list 
of  the  outgoing  edges  corresponding  to  tree  edges.   We  can 
do  this  in  O(log  m)  =  Odog  n)  time  with  0(m)  processors 
by  using  a  "doubling"  technique  [W  79].   For  each  edge 
(i,j),  we  initialize  treenext ( (i , j ) )  =  next ( (i , j ) )  and  then 
repeat  the  following  step,  in  parallel  on  all  edges  (i,j) , 
until  none  of  the  treenext  values  chanae:   if  treenext((i',  j )  ) 
is  not  jiull^  and  not  marked,  replace  treenext  (  (i  ,  j  )  )  by 
treenext  ( treenext  (  ( i  ,  j  )  )  )  .   This  takes  Odog  m)  iterations 
over  the  edges.   Once  all  the  treenext  values  are  computed, 
we  define  treeadj  (i) ,  for  each  vertex  i,  to  be  adj  (i)  if 
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adj  (i)  is  null  or  marked,  treenext (adj (i) )  otherwise. 
The  treeadj  and  treenext  maps  define  incidence  lists  for 
the  spanning  tree. 

Next,  we  construct  a  circular  list  corresponding 
to  an  Eulerian  tour  of  the  directed  version  of  the  spanning 
tree.   For  each  edge  (i,j),  the  next  edge  tournext (  (i,  j )  ) 
in  the  tour  is  treenext ( ( j , i) )  if  treenext ((7.1))  is  not 
null ,  treeadj ( j )  otherwise.   This  tour  corresponds  to  the 
order  of  advancing  and  retreating  along  edges  during  a 
depth-first  traversal  of  the  tree,  starting  at  an  arbitrary 
vertex.   To  root  the  tree,  we  break  the  Euterian  tour  at 
an  arbitrary  edge,  causing  some  edge,  say  (i,j),  to  be 
the  first  edge  on  the  list.   Vertex  i  becomes  the  root  of 
the  tree.   We  call  the  broken  list  the  traversal  list.   We 
can  number  the  edges  from  1  to  2n-2  in  traversal  order 
in  O(log  n)  time  with  0(n)  processors  by  using  the  doubling 
technique  to  compute  for  each  edge  (i,j)  the  number  of 
edges  from  (i,j)  to  the  end  of  the  list.   We  do  this  by 
initializing  numtoend ( ( i , j ) )  =  1  and  ptr (  (i  ,  j) )  =  tournext ( (i , j ) ) 
for  all  edges  {i,j)  and  repeating  the  following  computation 
in  parallel  at  the  edges  until  ptr ( (i , j ) )  =  nulj^  for  all 
{i,j):   if  ptr  (  (i  ,  j  )  )  7^  null,  replace  numtoend  (  (i,  j)  )  by 
numtoend ( (i, j) )  +  numtoend (ptr (i , j ) )  and  ptr ( (i, j) )  by 
ptr  (ptr  (i,  j)  )  .   Once  this  com.putation  is  complete,  the 
number  of  edge  (i,j)  is  2n-l-numtoend ( (i , j ) ) . 
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Cf  two  edges  (i,j)  and  (j,i),  the  lower-numbered 
one  corresponds  to  an  advance  from  i  to  j  along  tree  edge 
{i,j}  and  the  higher-numbered  one  to  a  retreat  from  j  to 
i  along  {i,j}.   Using  the  edge  numbers,  we  can  thus  m.ark 
each  directed  edge  as  either  an  advance  edge  or  a  retreat 
edge.   For  each  vertex  j  other  than  the  tree  root,  there 
is  exactly  one  advance  edge  (i,j);  the  parent  p(j)  of  j 
in  the  tree  is  i  . 

In  the  traversal  list,  the  advance  edges  {i,j) 
occur  in  preorder  on  j ,   We  can  thus  number  the  vertices 
in  preorder  using  doubling,  much  as  we  computed  the 
edge  numbers.   The  only  differences  are  that  we  initialize 
numtoend (i , j)  to  be  1  if  (i,j)  is  an  advance  edge,  0  other- 
wise, and  when  the  computation  is  complete,  if  (i,j)  is  an 
advance  edge,  we  define  n+1  -  numtoend ( i , j )  to  be  the  pre- 
order number  of  vertex  j.   Once  preorder  numbers  are  com- 
puted, we  replace  each  occurrence  of  a  vertex  by  its 
preorder  number,  retaining  an  inverse  map  to  restore  the 
original  vertex  nam.es  when  the  computation  is  complete. 
(For  each  number  i,  we  remember  vertex (i) ,  the  vertex  with 
number  i . ) 

Remark.   Although  not  needed  in  this  paper,  a  similar  compu- 
tation will  number  the  vertices  in  postorder;  for  each  vertex 
j  other  than  the  tree  root,  there  is  exactly  one  retreat  edge 
(j,i)/  and  the  retreat  edges  appear  in  the  traversal  list  in 
postorder  on  j .   Q 
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The  last  part  of  Step  1  is  the  computation 
of  the  ntupber  of  descendants  nd(j)  of  each  vertex  j.   If  j  is 
not  the  tree  root,  nd(j)  is  just  the  number  of  advance  edges 
from  (p(j)/j)  to  the  end  of  the  list  (including  (p(j),j)) 
minus  the  number  of  advance  edges  from  (j,p(j))  to  the  end  of 
the  list.   Two  doubling  computations,  one  of  which  we  have 
already  done  to  compute  preorder  numbers,  and  a  parallel 
subtraction  give  the  number  of  descendants  of  all  the  vertices. 

Step  2.   Computation  of  low  ( j)  and  high ( j )  for  each  vertex  j. 

We  shall  describe  how  to  compute  low;  the 
computation  of  high  is  similar.   Using  doubling  on  the 
adjacency  lists,  we  can  compute  locallow  (  j  )  =  min  ( {  j  } '^i  {k  [  ( j  ,k) 
is  an  unmarked  (nontree)  edge})  for  each  vertex  j  in  O(log  n)  time 
using  0(m,)  processors.   We  then  compute  low  ( j )  for  each 
vertex  j  using  the  formula 

low  (  j  )  =  min{  locallow  (k)  |  j  £  k  <_  j  +  nd  (  j)  -1}  . 

The  computation  of  the  low  values  uses  a  modified 
doubling  computation.   For  1  <_  j  <  n,  we  maintain  four 
values:   low  (  j )  ,  globallow ( j )  ,  little ( j )  ,  and  big ( j )  , 
initially  equal  to  locallow (j+nd ( j )- 1 ) ,  locallow ( j ) ,  j,  and 
j  +  nd{j)  -  1,  respectively.   We  use  l_log-,nJ  +  1  iterations 
over  the  vertices,  one  for  each  value  of  i  in  the  range 
0  <_  i  <_  l_log2nJ  .   At  the  beginning  of  iteration  i. 
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globallow  ( j)  =  min{locallow  (k)  |  j  f.  k  £  j  +  2  -1}  if  j  is 
divisible  by  2''"  and  j  +  2''"-l  <_  n.   Furthermore  if 
little ( j)  =  big ( j)  then  low(j)  has  its  correct  value; 
otherwise  j  <  little  ( j)  <_   big(  j)  £  j  +  nd(j)-l,  both 
little ( j )  and  big ( j )  are  divisible  by  2  ,  and 
low  ( j)  =  mindocallow  (k)  |  j  <_  k  <  little  (  j  )  or 
big(  j)  1  J<  1  j  +  nd(  j)-l}  . 
The  i^   iterati( 
following  computation  in  parallel  for  each  value  of  j  in 
the  range  1  ^  J  1  "= 

Step  A.   If  little(  j)  <  big(j)  and  little  ( j )  /  0  mod  2^'^-^, 
replace  low(j)  by  min{low ( j) , globallow ( little ( j ) ) }  and 


Step  B.   If  little(  j)  <  bia(j)  and  bia(j)  ^  0  mod  2^"^-^ 
replace  low  (  j  )  by  m.in{  low  (  j  )  ,  globallow  (big  (  j)  -2  )}  and 
big(j)  by  big ( j )  -  2^. 

Step  C.   If  j  s  0  mod  2^^-'-  and  j  +  2^^-'--l  <_   n,  replace 
globallow ( j )  by  minfgloballow ( j ) , globallow ( j+2  )}. 


It  is  easy  to  verify  the  correctness  of  this 
computation,  which  takes  O(log  n)  time  using  0(n)  processors 

Step  3.   Construction  of  the  auxiliary  graph  L" . 

This  computation  requires  only  0(1)  time  using 
0(m)  processors,  since  testing  the  appropriate  condition  for 
each  possible  edge  of  L"  takes  0(1)  time.   After  this  test. 
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which  takes  place  in  parallel,  we  have  a  set  of  at  most 
ip-1  processors,  each  of  which  knows  an  edge  of  L"  . 

Step  4.   Findina  the  connected  components  of  L". 

We  apply  the  connected  components  algorithm  of 
Shiloach  and  Vishkin.   The  information  computed  in  step  3 
is  sufficient  as  input  to  this  algorithm.   Once  the 
algorithm  finishes,  each  vertex  (i,j)  of  L"  Advance  edge 
of  the  spanning  tree)  has  a  D-pointer  to  a  canonical 
"vertex"  (x,y)  representing  the  connected  component  con- 
taining (i , j  )  . 

Step  5.   Extension  of  the  equivalence  relation  found  in 
Step  4  to  the  edges  of  G-T. 

For  each  non-tree  edge  (i,j)  such  that  i  <  j, 
we  assign  D((i,j))-D((p(j),j)). 

This  completes  the  computation  except  for  restoring 
the  original  vertex  names.   An  inspection  of  the  various 
steps  shows  that  none  uses  more  than  O(log  m)  =  O(log  n) 
time,  more  than  0(n+m)  space,  or  more  than  0(n+m)  processors. 
The  only  place  concurrent  writing  is  used  is  in  the  connected 
components  algorithm,  used  in  Steps  1  and  3. 

4 .   An  Alternative  Parallel  Implementation 

In  this  section  we  develop  an  implementation  of 
the  block-finding  algorithm  that  runs  in  O(log  n)  time  using 
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0(n  /log  n)  processors  on  a  CREW  PRA.M,  assuming  that  the 
input  graph  is  represented  by  an  adjacency  matrix.   Since 
we  can  always  trade  time  for  processors,  this  method  gives 

an  0(n  /p)  time  algorithm  using  p  processors,  for  any 

2     2 

p  _<  n  /log  n.   This  algorithm  has  optimal  speed-up,  assuming 

an  adjacency  matrix  representation  of  the  input.   The 
algorithm  has  the  same  resource  bounds  as  the  similar  but 
somewhat  more  complicated  algorithm  of  Tsin  and  Chin  [TC  82]. 
We  shall  not  go  through  the  details  of  the  implementation  but 
merely  mention  where  it  differs  from  the  Odog  n)-time 
implementation  of  the  previous  section. 

There  are  two  known  connected  components 

2  2     2 

algorithms  that  run  in  O(log  n)  time  using  0(n  /log  n) 

processors:   the  algorithm  of  Vishkin  [V  81b],  which  runs 

on  a  CRCW  PRAM,  and  the  algorithm  of  Chin,  Lam,  and  Chen 

[CLC  81]  ,  which  runs  on  a  CREV;  PRAM.   Although  the  latter 

is  more  complicated,  we  shall  use  it  instead  of  the  former  in 

Steps  1  and  4,  since  it  uses  a  less  powerful  computation  model 

Chin,  Lam,  and  Chen  describe  how  to  adapt  their  algorithm 

to  compute  a  (minimum)  spanning  forest. 

Step  1.   Construction  of  a  spanning  tree  and  computation  of 
the  preorder  number  and  number  of  descendants  of  each  vertex. 

We  apply  the  algorithm  of  Chin,  Lam,  and  Chen  to 
mark  the  entries  in  the  adjacency  matrix  corresponding  to 
tree  edges.   We  can  convert  each  row  of  the  adjacency  matrix 
to  an  incidence  list  for  the  corresponding  vertex  (of  edges 
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incident  in  the  spanning  tree)  by  using  a  balanced  binary 
tree  with  n  leaves  to  guide  the  computation.   (For  each 
marked  entry,  we  need  to  compute  the  next  marked  entry  in 

the  row.)   The  computation  is  similar  to  a  standard  mini- 

2  2 

mization  and  takes  O(log  n)  time  with  O(n/log  n)  processors 

(see  [K  79]).   Since  we  can  carry  out  the  computation  for 

all  rows  in  parallel,  the  total  time  is  O(log  n)  with 

2     2 
0(n  /log  n)  processors.   Establishing  pointer-  between 

each  directed  edge  (i,j)  and  its  reverse  is  easy.   Now 

we  have  the  representation  of  the  unrooted  spanning  tree 

used  in  Section  3.   The  remainder  of  the  Step  1  computation 

proceeds  as  in  Section  3,  taking  O(log  n)  time  on  0(n) 

processors . 

Step  2.   Computation  of  low  and  high. 

Computing  locallow  ( j )  requires  n  parallel  minimum 

2  2 

computations.   Each  takes  O(log  n)  time  using  O(n/log  n) 

2     2 
processors  [W  79],  a  total  of  0(n  /log  n)  processors. 

The  rem.ainder  of  the  low  computation  proceeds  as  in  Section  3 

taking  0 ( log  n)  time  using  0(n)  processes.   The  computation 

of  high  is  similar. 

Step  3.   Construction  of  the  auxiliary  graph  L" . 

2  2     2 

This  is  easy  in  O(log  n)  time  with  0(n  /log  n) 

processors . 
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Step  4.   Finding  the  connected  components  of  L" . 

We  again  apply  the  algorithm  of  Chin,  Lam  and 
Chen. 

Step  5.   Extension  of  the  equiva.lence  relation  found  in 
Step  4  to  the  edges  of  G-T . 

2  2     2 

This  is  easy  in  OClog  )  time  with  0(n  /log  n) 

processors . 

We  close  this  section  and 'the  paper  with  a  few 
remarks  about  future  work.   The- parallel  tree  computations 
used  in  Section  3  may  have  applications  in  other  graph 
algorithms.   This  deserves  study.   Also,  there  are  still 
open  problems  concerning  parallel  biconnectivity  algorithms. 
The  algorithm  of  this  section,  as  does  the  algorithm  of  Tsin 
.and  Chin  [TC  82],  has  optimal  speed-up  for  sparse  graphs 
but  not  for  dense  ones,  whereas  the  algorithm  of  Section  3 
is  off  by  a  factor  of  log  n  from  optimal  speed-up. 
A  question  worth  exploring  is  whether  there  is  an 

0((n+m)/p)  time  algorithm  using  p  processors,  for  p  sufficiently 
small  (say  p  <_  (n+m) /log  n  or  p  <_  (n+m)/logn.)  Such  an  algo- 
rithm is  unknown  even  for  the  problem  of  computing  connected 
components . 
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